7 research outputs found
Fast Context Adaptation via Meta-Learning
We propose CAVIA for meta-learning, a simple extension to MAML that is less
prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA
partitions the model parameters into two parts: context parameters that serve
as additional input to the model and are adapted on individual tasks, and
shared parameters that are meta-trained and shared across tasks. At test time,
only the context parameters are updated, leading to a low-dimensional task
representation. We show empirically that CAVIA outperforms MAML for regression,
classification, and reinforcement learning. Our experiments also highlight
weaknesses in current benchmarks, in that the amount of adaptation needed in
some cases is small.Comment: Published at the International Conference on Machine Learning (ICML)
201
TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the
decomposition of complex, real-world tasks into simpler sub-tasks. By reusing
the corresponding sub-policies within and between tasks, they provide training
data for each policy from different high-level tasks and compose them to
perform novel ones. Existing approaches to modular LfD focus either on learning
a single high-level task or depend on domain knowledge and temporal
segmentation. In contrast, we propose a weakly supervised, domain-agnostic
approach based on task sketches, which include only the sequence of sub-tasks
performed in each demonstration. Our approach simultaneously aligns the
sketches with the observed demonstrations and learns the required sub-policies.
This improves generalisation in comparison to separate optimisation procedures.
We evaluate the approach on multiple domains, including a simulated 3D robot
arm control task using purely image-based observations. The results show that
our approach performs commensurately with fully supervised approaches, while
requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
Trading off exploration and exploitation in an unknown environment is key to
maximising expected return during learning. A Bayes-optimal policy, which does
so optimally, conditions its actions not only on the environment state but on
the agent's uncertainty about the environment. Computing a Bayes-optimal policy
is however intractable for all but the smallest tasks. In this paper, we
introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to
perform approximate inference in an unknown environment, and incorporate task
uncertainty directly during action selection. In a grid-world domain, we
illustrate how variBAD performs structured online exploration as a function of
task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in
meta-RL and show that it achieves higher online return than existing methods.Comment: Published at ICLR 202
Hierarchical Imitation Learning for Stochastic Environments
Many applications of imitation learning require the agent to generate the
full distribution of behaviour observed in the training data. For example, to
evaluate the safety of autonomous vehicles in simulation, accurate and diverse
behaviour models of other road users are paramount. Existing methods that
improve this distributional realism typically rely on hierarchical policies.
These condition the policy on types such as goals or personas that give rise to
multi-modal behaviour. However, such methods are often inappropriate for
stochastic environments where the agent must also react to external factors:
because agent types are inferred from the observed future trajectory during
training, these environments require that the contributions of internal and
external factors to the agent behaviour are disentangled and only internal
factors, i.e., those under the agent's control, are encoded in the type.
Encoding future information about external factors leads to inappropriate agent
reactions during testing, when the future is unknown and types must be drawn
independently from the actual future. We formalize this challenge as
distribution shift in the conditional distribution of agent types under
environmental stochasticity. We propose Robust Type Conditioning (RTC), which
eliminates this shift with adversarial training under randomly sampled types.
Experiments on two domains, including the large-scale Waymo Open Motion
Dataset, show improved distributional realism while maintaining or improving
task performance compared to state-of-the-art baselines.Comment: Published at IROS'2
Learning from Demonstration in the Wild
Learning from demonstration (LfD) is useful in settings where hand-coding
behaviour or a reward function is impractical. It has succeeded in a wide range
of problems but typically relies on manually generated demonstrations or
specially deployed sensors and has not generally been able to leverage the
copious demonstrations available in the wild: those that capture behaviours
that were occurring anyway using sensors that were already deployed for another
purpose, e.g., traffic camera footage capturing demonstrations of natural
behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour
(ViBe), a new approach to learn models of behaviour from unlabelled raw video
data of a traffic scene collected from a single, monocular, initially
uncalibrated camera with ordinary resolution. Our approach calibrates the
camera, detects relevant objects, tracks them through time, and uses the
resulting trajectories to perform LfD, yielding models of naturalistic
behaviour. We apply ViBe to raw videos of a traffic intersection and show that
it can learn purely from videos, without additional expert knowledge.Comment: Accepted to the IEEE International Conference on Robotics and
Automation (ICRA) 2019; extended version with appendi
Learning from demonstration in the wild
Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc